Cleansing Wikipedia Categories using Centrality

نویسندگان

  • Paolo Boldi
  • Corrado Monti
چکیده

We propose a novel general technique aimed at pruning and cleansing the Wikipedia category hierarchy, with a tunable level of aggregation. Our approach is endogenous, since it does not use any information coming from Wikipedia articles, but it is based solely on the user-generated (noisy) Wikipedia category folksonomy itself. We show how the proposed techniques can help reduce the level of noise in the hierarchy and discuss how alternative centrality measures can differently impact on the result.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Topic Identification Using Wikipedia Graph Centrality

This paper presents a method for automatic topic identification using a graph-centrality algorithm applied to an encyclopedic graph derived from Wikipedia. When tested on a data set with manually assigned topics, the system is found to significantly improve over a simpler baseline that does not make use of the external encyclopedic knowledge.

متن کامل

Assessing the Quality of Wikipedia Pages Using Edit Longevity and Contributor Centrality

In this paper we address the challenge of assessing the quality of Wikipedia pages using scores derived from edit contribution and contributor authoritativeness measures. The hypothesis is that pages with significant contributions from authoritative contributors are likely to be high-quality pages. Contributions are quantified using edit longevity measures and contributor authoritativeness is s...

متن کامل

Customer Knowledge and Service Development, the Web 2.0 Role in Co-production

The paper is concerned with relationships between SSME and ICTs and focuses on the role of Web 2.0 tools in the service development process. The research presented aims at exploring how collaborative technologies can support and improve service processes, highlighting customer centrality and value coproduction. The core idea of the paper is the centrality of user participation and the collabora...

متن کامل

The web mirrors value in the real world: comparing a firm's valuation with its web network position

This paper compares a firm’s innovation and performance with its online Web presence measured through the Web network structure. 489 firms in five different industries listed on the United States and Chinese stock markets are investigated. Using Web link data collected from Bing, blogs, Twitter and Wikipedia, we find positive correlation between betweenness centrality of a firm in the Web netwo...

متن کامل

Evaluating authoritative sources using social networks: an insight from Wikipedia

Purpose – The purpose of this paper is to present an approach to evaluating contributions in collaborative authoring environments and in particular wikis using social network measures. Design / methodology / approach – A social network model for wikipedia has been constructed and metrics of importance such as centrality have been defined. Data have been gathered from articles belonging to the s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016